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Abstract. This paper is a review dealing with the study of large size random 
recurrent neural networks. The connection weights are varying according to a 
probability law and it is possible to predict the network dynamics at a macroscopic 
scale using an averaging principle. After a first introductory section, the section 2 
reviews the various models from the points of view of the single neuron dynamics 
and of the global network dynamics. A summary of notations is presented, which 
is quite helpful for the sequel. In section 3, mean-field dynamics is developed. The 
probability distribution characterizing global dynamics is computed. In section 
4, some applications of mean-field theory to the prediction of chaotic regime for 
Analog Formal Random Recurrent Neural Networks (AFRRNN) are displayed. 
The case of AFRRNN with an homogeneous population of neurons is studied in 
section 4.1. Then, a two-population model is studied in section 4.2. The occurrence 
of a cyclo-stationary chaos is displayed using the results of [16]. In section 5, an 
insight of the application of mean-field theory to IF networks is given using the 
results of [9]. 

1 Introduction 

Recurrent neural networks were introduced to improve biological plausibility of artificial neural 
networks such as perceptrons since they display internal dynamics. They are useful to implement 
associative recall. The first models were endowed with symmetric connexion weights which 
induced relaxation dynamics and equilibrium states [28] . Asymmetric connexion weights were 
introduced later on, enabling the observation of complex dynamics and chaotic attractors. The 
role of chaos in cognitive functions was first discussed by W. Freeman and C.Skarda in seminal 
papers such as [37]. The practical importance of such dynamics is due to the use of on-line 
Hebbian learning to store dynamical patterns. For a review see for instance [26]. 

The nature of the dynamics depends on the connexion weights. When considering large sized 
neural networks, it is impossible to study the dynamics as a function of the many parameters 
involved in the network dynamics: parameters defining the state of the neuron such as the 
Sodium, Potassium conductibility in Hodgkin-Huxley models; parameters defining the structure 
of the synapses; parameters attached to the environment; external inputs; etc .... One may 
consider that the connexion weights share few values but this does not allow to study the 
effect of the variability. Henceforth, one often considers random models where the connexion 
weights form a random sample of a probability distribution. These models are called "Random 
Recurrent Neural Networks" (RRNN). 

In this context, the parameters of interest are those defining the probability distribution, 
i.e. the statistical parameters (introduced as "macroscopic parameters" in the first paper of this 
review, refereed, from now on, as paper I). Then a study of the dynamics in terms of relevant 
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dynamical quantities, called order parameters 1 can be performed via "Dynamic Mean-Field 
Equations" (DMFE). This terminology and method is inherited from statistical physics and 
quantum field theory, though is has to be adapted to the present context. Mean-Field Equations 
were introduced for neural networks by Amari [2] and later on by Crisanti and Sompolinsky [38] . 
Their results were extended in [12] and proved in a rigorous way in [34]. In [34] the Authors 
used a "Large deviation Principle" (LDP) coming from rigorous statistical mechanics [7]. 
Simultaneously, mean-field equations were successfully used to predict the dynamics of spiking 
recurrent neural networks [9,40]. 

This paper intends to provide a bridge between the detailed computation of the asymptotic 
regime and the rigorous aspects of MFE theory. We shall introduce the mathematical basis of 
dynamic mean field theory in sections 2,3, and apply it to several prominent examples. Note 
however that our approach is different from the standard ones based either on the computation 
of a functional generating the cumulants, or using an ad hoc approximation replacing the sum 
of incoming synaptic potentials by a Gaussian random variable. Instead we use large deviations 
techniques (detailed in the appendix) . They have the advantage to be rigorous and they allows 
to prove convergence results stronger than the usual techniques. 

In section 2, the various models are stated from the points of view of the single neuron 
dynamics and of the global network dynamics. A summary of notations is presented, which 
is quite helpful for the sequel. In section 3 mean-field dynamics is developed. The probability 
distribution characterizing global dynamics is computed. The mathematical tools which are used 
there are detailed (without any proof) in appendix. In section 4, some applications of mean-field 
theory to the prediction of chaotic regime for analog formal random recurrent neural networks 
(AFRRNN) are displayed. The dynamical equation of homogeneous AFRRNN, which is studied 
in paper I, is derived from the random network model in section 4.1. Moreover a two-population 
model is studied in section 4.2 and the occurrence of a cyclo-stationary chaos is displayed using 
the results of [16]. In section 5, an insight of the application of mean-field theory to IF networks 
is given using the results of [9] . The model of this section is a continuous-time model following 
the authors of the original paper. Hence the theoretical framework of the beginning of the paper 
has to be enlarged to support this extension of mean-field theory and this work has still to be 
done. However, we sketch a parallel between the two models to induce further research. 



2 Dynamics of Random Recurrent Neural Networks. 
2.1 Defining dynamic state variables 

The state of an individual neuron i at time t is described by an instantaneous individual 
variable, the membrane potential Ui(t). In stochastic models, such as the ones considered here, 
all variables (including the Uj(i)'s) are (real) random variables 2 which takes their values in IR. 
In this section, we consider discrete time dynamics and restrict ourselves to finite time-horizon, 
i.e. we consider time t as an integer belonging to time interval{0, 1, T} where T is finite. 

Thus an individual state trajectory Uj = (w»(i))te{o,i,...,T} takes its value in T = f(i°^--- ' T \ 
Though one is also interested in long-time behaviour and stationary regime (if any), rigorous 
proofs of convergence of large-size networks only exist for finite time. 

We shall study the probability distribution of trajectories instead of the probability of an 
instantaneous state. Actually, the later can be easily obtained from the former. The second 
order moments of an individual trajectory m are its expectation E(ui) € T and its covariance 
matrix Cov(ui) e T ' <£> T ' . 



1 This terminology comes from statistical physics and was introduced by the physicists L. Landau. 
Prominent examples of order parameters are the magnetization in the Ising model or the Edwards- 
Anderson parameter in spin-glasses. 

2 Following the standard physicist's habit, the random variables won't be noted by capital letters. 
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Our aim is to study the coupled dynamics of N interacting neurons that constitute a neural 
network. N is the size of the neural network. The global state trajectory of u = (ui)ie{i...,./v} 
is a random vector in T N . The probability distribution 3 of the random vector u, denoted by 
Qm depends on N. We shall compute it for various neuron models in this section. We shall 
first focus to the case of homogeneous neural networks then more realistic cases such as several 
population networks will be considered. 

As it was detailed in paper I, the dynamics of the neuron models that we study here depends 
on three crucial points. 

- how does the neuron activation depend on the membrane potential, 

- how do the other neurons contribute to the synaptic potential which summarizes completely 
the influence of the network onto the target neuron, 

- how is the synaptic potential used to update the membrane potential. 

We shall now detail these points in the models that are considered further on. 
2.2 Spike firing modeling 

As introduced in paper I, it is considered that the neuron is active and emits a spike whenever 
its membrane potential exceeds the activation threshold. So the neuron i is active at time t 
when Ui(t) > 9 where 9 is the neuron activation threshold. We consider here that 9 is constant 
during the evolution and is the same for all neurons. Actually this hypothesis may be relaxed 
and random thresholds may be considered but the notation and the framework of dynamical 
study would be more complicated (see [34]). 

For spiking neuron models we define an activation variable Xi (t) which is equal to 1 if neuron 
i emits a spike at time t and to otherwise. Hence we have 

Xi (t) = f[ Ui (t) - 6] (1) 

where /, called the transfer function of the neuron is here the Heaviside function. Actually, to 
alleviate notations, we shift m of 9 and that allows to replace equation (1) by equation 

Xi(t) = f[ Ui (t)} (2) 

The threshold will be further on taken into account in the updating equation. 

Two spiking neuron models are considered here, the Binary Formal neuron (BF) which is 
the original model of Mac Culloch and Pitts [32] and the Integrate and Fire neuron (IF) which 
is generally used nowadays to model dynamics of large spiking neural networks [22]. 

In these models, the neuron activation takes generally two values: and 1. This is true 
for most models of neurons. However, it was preferred in a lot of research works to take into 
account the average firing rate of the model instead of the detailed instant of firing (see section 
5.1, paper I). This point of view simplifies the model as it deals with smooth functions easier 
to handle from a mathematical point of view. In this case, equation (2) is still valid but Xi(t) 
takes its values in the interval [0, 1] and the transfer function / is a smooth sigmoid function 
e x 

for instance f(x) = ^— — -. Since the activation of the neuron is represented by a real value 

that varies continuously, the model is called Analog Formal neuron (AF). 

AF model is still widely dominant when Artificial Neural Networks are considered for appli- 
cations since gradient are easy to compute. For biological purpose, it was widely believed that 
the relevant information was stored in the firing rate; in that case more precise modeling would 
not be so useful, at least from a functional point of view. 



3 This term is defined in Appendix, definition 7 
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The three models arc studied in that chapter and we attempt to give a unified presentation 
of mean-field equation for these three models. Note that, in the sequel, the state of the neuron 
will be the membrane potential Ui(t) and not the activation Xi(t). This is due to the updating 
definition in the IF model. We shall return to that point later on. 



2.3 The synaptic potential of RRNN 

The spikes are used to transmit information to other neurons through the synapses. We shall 
adopt here the very rough, but classical description of the synapse. Namely, the synaptic con- 
nexion from neuron j to neuron i is denoted by Jij . It can be positive (excitatory) or negative 
(inhibitory). Let us denote by J — (Jij) the matrix of synaptic weights. At time 0, the dynam- 
ical system is initialized and the synaptic potentials are set to zero 4 . 

The synaptic potential of neuron i of a network of N neurons at time t + 1 is expressed 5 as 
a function of J and u(t) e IR W by 



(The notation Vi{J,u)(t + 1) will be explained below). 

As discussed in the introduction, we consider here random models where the connexion 
weights form a random sample of a probability distribution ( "Random Recurrent Neural Net- 
works" (RRNN)). In that case, the parameters of interest are the statistical parameters defining 
the probability distribution. A standard example considered in this paper is Gaussian connex- 
ion weights. In this case, the statistical parameters are denoted by 6 J and J 2 so that J is a 
normal random matrix with independent components distributed according to the normal law 



Note that the assumption of independence is crucial in the approach described below. Un- 
fortunately, the more realistic case where correlations between the J^'s exist (e.g. after Hebbian 
learning) is, currently, out of reach for all the mean-field methods that we know. We shall first 
consider Gaussian synaptic synaptic, but we shall extend later on the RRNN model properties 
to a more general setting where the weights are non Gaussian and depend on the neuron class 
in a several population model like in [16]. 

We have already dealt with the dynamical properties of RRNN such as (3) in paper I, 
considered from the dynamical system point of view, where we fix a realization of the J^ 's and 
consider the evolution of trajectories of this dynamical system. Then, we have averaged over the 
Jij distribution in order to get informations about the evolution of averaged quantities. In the 
present paper we shall start with a complementary point of view. Namely, assume that we fix the 
trajectory Ui of each neuron (resp. we fix the trajectory u of the network). Then, at each time 

step the variable Y^j=i Jijf[ u j(t)] is a Gaussian random variable whose probability distribution 
is induced by the distribution of the Jij's. Of course, this distribution depends on the trajectory 

(for example £?EjLi Jijf[ u j(t)]] = jf SjLi /[ u j (*)])■ To emphasize this dependence we shall 
denote by Vi{J, u) = (vt(J ', u)(tj) e T the trajectory of the synaptic potential as in (3). 

With this line of reasoning one can show that the Vi(.,u) are (conditionally on u) Gaus- 
sian identically distributed and independent random vectors in T (see appendix, proposition 

4 This may for example corresponds to a rest state set to zero without loss of generality (since its 
corresponds to changing the voltage reference for the membrane potential, see paper I). 
° See section 5 in paper I. 

6 We use here the same notation as in paper I. Recall that the scaling with i allows to have a 
synaptic potential whose mean and variance are independent of N. 
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(16). The distribution of Vi is therefore 7 defined by its mean m u and its covariance matrix c v 
(depending on u). 



We have 



J N 



and 



m u (t + l) = -J2fi u M (4) 

c u (s + l,t+l) = —J2 fMs)]f[uj(t)] (5) 

j'=i 

Notice that these quantities, called order parameters in the sequel, are invariant by any 
permutation of the neuron membrane potentials. Actually, they depend only on the empirical 
distribution 8 /i u , associated to u. 

Definition 1 The empirical measure is an application from T N to V(F), the set of probability 
measures on T . It is defined by 

i N 

^u(A) = -J2^(A) (6) 

i=i 

where S U (A) is the Dirac mass on the set A, where S U (A) = 1 if u belongs to A and otherwise. 

Using this formalism provides an useful way to perform an average over a probability dis- 
tribution on the trajectories u. For example, the average J2f=i where g is some 
function, writes 9 J g(jj(t))d^ u (rj) . More generally, assume that we are given a probability dis- 
tribution /j, on the space of trajectories F . Then, one can perform a generic construction of a 
Gaussian probability on T. 

Definition 2 For any /x € V{F) the Gaussian probability distribution on R T , with moments 
77i M and Cfj,, is defined by : 

\c,(s + l,t+ 1)] — J 2 J f[v(s)}f[v(t)Wu(v) U) 
Then, it is easy to reformulate the previous computation as: 

Proposition 1 The common probability distribution of the individual synaptic potential tra- 
jectories Vi(.,u) is the normal distribution g^ u where \i u is the empirical distribution of the 
network potential trajectory u. 

This framework is useful to compute the large-size limit of the common probability distri- 
bution of the potential trajectories. 



2.4 Dynamical models of the membrane potential 

We shall now detail the updating rule of the membrane potential. Various neural dynamics 
have been detailed in paper I. We focus here on Analog Formal (AF), Binary Formal (BF), and 
Integrate and Fire neuron (IF). 

In any case, the network is initialized with independent identically distributed membrane 
potential according to a probability distribution m G V(R). It is useful, on technical grounds, 
to add to each neuron a small amount of noise. Thus we introduce for each neuron i, a sequence 

7 This distribution actually does not depend on i since the Vi's are identically distributed. 

8 This concept is introduced within more details in the appendix, definition (18) 

9 Note that the integration variable r\ corresponds to a trajectory of the network, and that n(i) 
corresponds to the state of the network at time t (i.e. is the state of neuron i at time t). 
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(wi)(t)) te {i t} °f i-i-d. centered Gaussian variables of variance a 2 . This sequence is called the 

synaptic noise of neuron i. The synaptic noise plays an important part in the mathematical 
proof but the order parameter a is as small as necessary. So this model is not very restrictive. 
The synaptic noise is added to the synaptic potential. On biological grounds it may account for 
different effects such as the diffusion of neurotransmitters involved in the synaptic transmission, 
the degrees of freedom neglected by the model, external perturbations, etc ... Though it is not 
evident that the "real noise" is Brownian, using this kind of perturbations has the advantage of 
providing a tractable model where standard theorems in the theory of stochastic processes or 
methods in non equilibrium statistical physics (e.g. Fokkcr-Planck equations) can be applied, 
fn some papers, the synaptic noise is called thermal noise or annealed noise by comparison 
with the random variables J = (Jij), which are called quenched variables as they are fixed, 
once for all, and do not change with time (we do not consider learning in this paper). 

The formal neuron updates its membrane potential according to 

Ui(t + 1) = Vi(t + 1) + Wi(t + l)-6 (8) 

IF neuron takes into account its present membrane potential while updating. Its evolution 
equation is 

Ui (t + 1) = ip[ Ui (t) + 9)} + Vi (t + 1) + Wi(t + l)-9 (9) 

where 

— ip is defined by 

, . f 7U if % < u < 9 , . 

^ = incise 7 (10) 

— 7 G]0, 1[ is the leak (damping coefficient). 

— i? is the reset potential and $ < < 9 

The following table summarizes the main properties of the three models we investigate: 



Transfer function 


Hcavisidc 


sigmoidal 


Formal model 


BF 


AF 


Integrate and Fire 


IF 





Assume for a moment that we remove the neural coupling, then the individual neuron state 
trajectories are independent, identically distributed, random vectors in T (whose randomness 
is induced by the Brownian noise). The corresponding dynamics is called the free dynamics. 
Let us denote by P the common distribution of the neurons trajectory in the uncoupled case. 
The probability distribution of the corresponding neural network trajectory is therefore P® N . 

In the case of formal neurons, the free dynamics equation is 

Ui(0)~m , Ui(t+l)=Wi(t+l)-6 (11) 

where ~ means "distributed according to" . So P = m ®M(—9 7 cr 2 )® T . In the case of IF neurons 
P is not explicit. It is the image of m ® N{—9, o- 2 )® T by the diffusive 10 dynamics 

Ui(0)~mo, Ui{t + 1) = if[ Ui (t) + 6)] + Wi {t + 1) - 6 (12) 

When coupling the neurons, the trajectory of the system of neurons is still a random vector. 
Its probability distribution, denoted by Qn, has a density with respect to P® N that can be 
explicitly computed. This is the main topic of the next subsection. 

10 This is indeed a discrete time stochastic difference equations with drift ip[ui(i) + 0] — 8 and diffusion 
Wi(t + 1). Incidentally, we shall come back to stochastic differential equations for neural networks in 
the section 5 and we shall consider the related Fokker-Planck equation. 
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2.5 Computation of the probability distribution of network trajectories. 

This section is devoted to the computation of the probability distribution Qm- The result 
shows that the density of Qn with respect to the free dynamics probability P® N depends on 
the trajectory variable u only through the empirical measure fj, u . To achieve this computation 
we shall use a key result of stochastic process theory, the Girsanov theorem [23,25], which gives 
the density of the new distribution of a diffusion when the drift is changed. Actually, since the 
time set is finite, the version of Girsanov theorem that we use is different from the original one 
[25] and may be recovered by elementary Gaussian computation. Its derivation is detailed in 
the appendix in theorem 19. A similar result may be obtained for continuous time dynamics 
using the classical Girsanov theorem (see [7] ) . 
Let us state the finite-time Girsanov theorem 

Theorem 2 Let tuq a probability measure on IR d and let Af(a, K) be a Gaussian regular 
probability on IR d with mean a and covariance matrix K. Let T be a positive integer and 
£t = (R d ){°'---' T } be the space of finite time trajectories in R d . Let w be a Gaussian ran- 
dom vector in £t with distribution mo®J\f(a, K) T . Let <f> and tp be two measurable applications 
of R d into R d . Then we define the random vectors x and y in £ by: 



x(t + l) = <j>[x(t)]+w(t + l) 



y(t + l)=^[ y (t)}+w(t + l) 
Let P and Q be the respective probability distributions on £ of x and y, then we have: 

§M - (15) 



exp^T 



' 1 / -iMM*)] - <t>Mt)]} t K- 1 {1>[(v(t)] - \ 

I - ^Mm'K-'Mt + l)-a- d>[r,(t)]} } 



t=o 

namely Q is absolutely continuous with respect to P. 

We shall use this theorem to prove the following: 

Theorem 3 The density of the distribution of the network membrane potential Q n with respect 
to P® N is given by 

' IQ * (u)=expNr(» u ) (16) 



dP® N 

where the functional T is defined on P(!F) by: 



T-l r 



r (p)= /logj/exp^g 



.^(t+l)2+# t+1 (^( t + l) 



dg^iO dvin) (17) 



with: 



- for AF and BF models: $t+i(v) =n{t + l) + 9 

- IF model: # t +i(»7) = v(t + 1) + 6 - vW) + 9} 

Remark 1 Let us recall that the Gaussian measure g^ has been defined previously (Definition 

2) 

PROOF OF THEOREM: Call Q N (J) the conditional distribution of the network state trajec- 
tory given J , the matrix of synaptic weights. We shall apply the finite-time Girsanov theorem 
2 to express d ®p^Q ■ To apply the theorem we notice that 
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— The difference of the two drift terms tp[r](t) — 4>[r){t)] of the theorem is here the synaptic 
potentials (vi(t)). The synaptic potentials Vi are functions of the u^'s according to (3) 

N 

v i (J,u)(t + l) = J2Jijf[u j (t)} 

j=l 

— The expression of the synaptic noise (wi(t + l), as a function of the state trajectory Ui in the 
free dynamics, is given by <P t +i(ui). The explicit form of <P depends on the neuron model 
(formal or IF). 

We have so 



dQ N {J) , , fr l'v 



JV 



t=0 i=l 
T-l 

a 2 



-MJ, «)(* + !) 2 + Vi(J,u){t + l)# t+ i(ui) 



(18) 



t=o 



--Vi(J, u)(t+l) 2 + Vi(J,u){t + l)# t+ i(«i 



(19) 



We have thus obtained the probability distribution of the membrane potentials in the cou- 
pled neurons models, but for a fixed realization of the Jij 's (conditional probability). 

Let us now consider the probability of the quenched variables J — (Jij). We observed 
previously when wc introduced the synaptic potential model that under the configuration dis- 
tribution of J, the random vectors Vi(J,u) are independent identically distributed according 
to the normal distribution g^ u To compute the density of Qn with respect to P® N one has 
thus to average the conditional density d ®p®N ^ over the distribution of J . Since the 's are 
independent, the integration separates into products and one gets from (19) the following 



dQ 



N 



dP® N 



N 1 T-l 

(u) = cxpj^ log / exp — ^2 

i=l a t=0 



;((t+l) 2 +$ i+1 K)C(t+l) 



The sum over i is equivalent to an integration over the empirical measure fi u (6), so we have 

T-l 



dQ 



N 



dP® N 



(«) 



exp N J log | J exp ^ 



Remark. These equations reminds the generating functional approach derived e.g. by Som- 
polinsky & al. [38,13] or Molgedey & al [31] allowing to compute the moments of the Ui(t)'s. 
However, the present approach provides a stronger result. While the generating functional 
method deals with weak convergence (convergence of generating function) the method devel- 
oped here allows to obtain directly the probability distribution of the Ui(t)'s. Moreover, by 
using large deviations techniques one is able to establish almost-sure convergence results (valid 
for only one typical sample). 

Let us now state an important corollary of this theorem. 

Corollary 4 The empirical measure fx u is a random measure governed by Qn- It has a density 
with respect to the distribution of the empirical measure of the free model, ( that is governed by 
P 0N ), given by: 

LiePiF)-* exp NT '(n) 
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2.6 Summary of notations 

Let us recall the notations of this section. They will be extensively used in the following sections: 



Notation 


Interpretation 


ie{i,...,N} 
te{0,...,T} 

Ui(t) 
Xi(t) 
Vi(t) 
Wi(t) 

Xi £ £ 

6 
a 
A 

f 

Jij 

J = (Jij) 

T 

N 

M e V{T) 
P G -P(^) 

3m e V{T) 
Q N e V{T N ) 


individual neuron in a N neuron population 
time label of the discrete time dynamics at horizon T 
membrane potential of neuron i at time t 
activation state of neuron i at time t 
synaptic potential of neuron i at time t 
synaptic summation noise of neuron i at time t 
membrane potential trajectory of neuron i from time to time T 
network membrane potentials trajectory (network trajectory) from time to time T 
activation state trajectory of neuron i from time to time T 

IlCLWUliS. dCLlvcxLlUll &LdlC lldJCL-LUi^y 11 U1I1 L1I11L. U LU L1I11C J. 

common firing threshold of individual neurons 

common standard deviation of the synaptic noise 

leak current factor for Integrate and fire (IF) neuron model 

neuron transfer function converting membrane potential into activation state 

synaptic weight from neuron j to neuron i (real random variable) 

synaptic weight matrix (random N x N matrix) 

expectation of synaptic weights 

variance of synaptic weights 

generic probability distribution of individual membrane potential trajectory 
random vector which takes its values in T under probability distribution /j 
probability distribution of individual membrane potential trajectory for free dynamics 
synaptic potential distribution obtained from /j, e ViF) through central limit approximation 
probability distribution of network membrane potential trajectory u 



3 The mean-field dynamics 

3.1 Introduction to mean-field theory 

The aim of this section is to describe the evolution of a typical neuron in the limit of large 
size networks. This is done by summarizing, in a single term, the effect of the interactions 
of this neuron with the other neurons of the network. A mean-field theory provides evolution 
equations of the type of free dynamics equation, that are involving single neuron dynamics, 
but where an effective interaction term remains. This term, or " mean- field" , is properly the 
average effect of all the interaction of other neurons with the neuron of interest. So the mean 
field dynamics is intermediate between the detailed dynamics, which takes into account all the 
detailed interactions between neurons, and the free dynamics, which neglects all interactions. 

To derive mean-field equations in a direct way, one can replace Vi{f) by an approximation 
which depends only on the statistical distribution of the Uj(t)'s. This approximation takes ad- 
vantage of the large number of synapses Jij to postulate the vanishing of individual correlations 
between neurons or between neurons and configuration variables. This is the hypothesis of "lo- 
cal chaos" of Amari ([1],[2]), or of "vanishing correlations" which is usually invoked to support 
mean- field equations. In the present context, it can be stated as follows. 

In the large size limit, the Uj 's are asymptotically independent, they are also independent 
from the configuration parameters 
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This approach is very similar to the Boltzmann's "molecular chaos hypothesis" 11 introduced 
by Boltzmann (1872), tailored to neural networks dynamics. 

From the central limit theorem we are then allowed to state that the random variable £(t + 
1) = J2f=i Jijf( u j(t)) i s a large su m of approximatively independent identically distributed 
variable, and thus that it has approximatively a Gaussian distribution. Thus, we just have to 
derive its first and second order moment from the common probability distribution of Ui — 
(ui(t)) to know completely the distribution of (. Henceforth, from a probability distribution 
on T which is supposed to be the common probability distribution of the Uj's, we are able to 
derive the distribution of £ and then the distribution of the resulting potential trajectory and 
the state trajectory of a generic vector of the network. 

The assumption on which the mean-field approximation is based may look entirely wrong at 
first glance. However, in the present context it gives exactly the same results as more elaborated 
methods such as the generating functional method, or the large deviations approach developed 
below. Moreover, it is supported by the "propagation of chaos" result proved in section 3.4. Note 
however that in models with correlated interactions (such as spin-glasses, where Jij = Jji) 
the "local chaos" hypothesis leads to wrong results (at low temperature) while generating 
functional methods [39,15,14] and large deviations techniques [7] can still be used. 



3.2 Mean-field propagation operator and mean-field equation 

We now define an evolution operator L, on the set V{F) of probability distributions on T ', that 
we call the mean- field propagation operator (or mean- field propagator). Let /j, £ V(F) be a 
probability measure on T . Let us compute the moments of 

AT 

v* g {o, i, t - 1}, at + 1) = E J ii/K'(*)i 

3=1 

where the u/s are independent identically distributed random vectors with probability distri- 
bution fi. They arc also independent from the configuration parameters J^. 

Since E[Jij] — and Var[J,j] = ^ , we have 

(E[C(t + l)] = J fr f[r,(t)]dii{r,) , , 

{ Cov[C( S + 1), C(t + 1)] = J 2 J £ f[v{8)]f[vWW(v) 1 ' 

Notice that the expression of the covariance is asymptotic since the sum of squares of expectation 
of the synaptic weights may be neglected. So ( is a Gaussian random vector in T with probability 
distribution (see definition 2). 

Definition 3 Let /i a probability distribution on T such that the distribution of the first com- 
ponent is mo. Let u,w,v be three independent random vectors with the following distributions 

— the distribution of u is jj,, 

— the distribution of w is Af(0,a 2 Ir), 

— the distribution of v is g^ 



1 The word "chaos" is somehow confusing here, especially because we also dealt with deterministic 
chaos in the paper I. Actually, "deterministic chaos" and the related exponential correlation decay can 
be invoked, in statistical physics, to obtain (deterministic) equations for the mean value and Gaussian 
fluctuations of relevant observables. In the present case the basic reason that makes the mean-field 
approach "works" is however different. This is the mere fact that the model is fully connected and 
that the Jij's are independent and vanishing in the limit iV — > oo. This is standard result in statistical 
physics models such as the Curie- Weiss model but obtaining this for the trajectories of a dynamical 
model with quenched disorder requires more elaborated techniques. 
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Then L(u) is the probability distribution on T of the random vector d which is defined by 

ftf(0) = u(0) ( 
\0(t + l)=v(t + l)+w(t + l)-6 { > 

for the formal neuron models (BF and AF), and by 

\ 0(t + 1) = <p[u{t) + 6)} + v(t + 1) + w(t + i)-e [ZZ) 
for the IF neuron model. The operator L which is defined on V{T) is called the mean- field propagation operator. 
Definition 4 The following equation on fj, G V{T) 

L(p) = a (23) 
is called the mean-field equation (MFE) 

Remark 2 The mean-field equation is the achievement of mean-field approach. To determine 
the distribution of an individual trajectory, we suppose that this distribution governs the inter- 
action of all the units onto the selected one. The resulting distribution of the selected unit has 
to be the same distribution than the generic distribution. This is summarized in the mean-field 
equation 

L(n) = u 

Equations (21) (resp. (22) for the IF model) with the specification of the probability distri- 
butions define the mean-field dynamics. Actually, the distribution L(/z) is just the convolution 
of the probability distributions P and the Gaussian distribution g M . More precisely, if we apply 
the discrete time Girsanov theorem 19 of the appendix, we have: 

Theorem 5 L(u) is absolutely continuous with respect to P and its density is given by 



dL(p) f 1^-4 



-1^ + 1)2+^(^ + 1) 



dg^O (24) 



PROOF : The proof is essentially a simplified version of the application of the finite-time 
Girsanov theorem which was used to prove theorem (3). The conditioning is done here with 
respect to v which is the difference between the drift terms of the free dynamics and of the 
mean-field dynamics. ■ 

Remark 3 We have to notice for further use that 

m = J\o g ^{nW{n) (25) 

In all the cases, for < t < T the projection of the distributions F(/i) and L(/i) on the t+l 
first time steps just depends on the projection of /j, on the t first instants. Since the projection 
of u on the initial instant is always too, the projection of L(a) on the two first instants {0, 1} 
depend only on to and similarly, the projection of on the t+l first instants {0, 1, t} 

depends only on toq. Eventually /iy = L T (fi) = L T \P) depends only on too and it is the only 
fixed point of the mean- field propagation operator L. 

So we have shown the following 

Theorem 6 The probability measure /it =L T (P) is the only solution of the mean-field equation 
with initial condition m . 



12 Will be inserted by the editor 

3.3 Large Deviation Principle for RRNN mean-field theory 

In this section, we fully use the computation results of the previous section to show the rigorous 
foundations of mean-field theory for RRNN. The approach is the following: 

(a) The empirical measure ji u of the network dynamics satisfies a large deviation principle 
(LDP) under P® N with a good rate function fj, e V(T) -> I(p,P) G IR + , the relative 
entropy between \i and P. Actually, when the size of the network tends to infinity, the 
empirical measure converges in distribution exponentially fast towards P. The definition of 
LDP and its consequences are outlined in the appendix in definition 3.3. Sanov theorem is 
stated in appendix, theorem 24. 

(b) According to corollary 4, the density of the new distribution of [i u with respect to the 
original distribution when we switch from P® N , that governs the free dynamics, to Qm, 
that governs the RRNN dynamics is expNr(fi). 

(c) Combining (a) and (b), one obtains that under Qn, the sequence [i u satisfies a LDP with 
the good rate function 

H(») = I(»,P)-r(n) (26) 

This kind of result is used in statistical physics under the name of Gibbs variational principle 
[21]. The functional H is called, in statistical physics, a thermodynamic potential (e.g. free 
energy or Gibbs potential). Notice that the classical statistical mechanics framework is 
relative to equilibrium probability distributions on the space of microscopic states. It is 
applied here to trajectories. For that reason, this approach is called the dynamic mean-field 
theory [38]. It is quite technical to support it rigorously. One has to show that H is lower 
semi-continuous and is a good rate function (see Varadhan's theorem 23 of the appendix). 
This kind of proof is rather technical. To reduce the size of the paper we admit the following 
result (see [7] for a general approach and [34] for the proof for AFRRNN model) 

Theorem 7 Under the respective distributions Qn the family of empirical measures (/xjv) 
of V(T) satisfies a full large deviation principle with a good rate function H given by (26). 

(d) It is clear from remark 3 that H(/j,t) = where jiT is is the unique solution of MFE with 
initial condition m , so it is the fixed point of L. Thus \it is a minimum of H. 

The basic computation is the following: first we apply the definition 19 of the relative entropy 
that is given in the appendix 

/dfj, r p 

Since fix is the solution of MFE, we have 

djOT dL(» T ) 

dp y " dp 1 " 

then we apply the previous remark 3 which states 

r{nT) = Jiog^^(v)dMv) 

to check 

I(^ T ,P) = r(n T )^H( f i T ) = 

(c) To obtain the exponential convergence of the sequence of empirical measures fi u under Qn 
when N — > oo, one has eventually to show that H(fi) = => fj, = fiT- This point is technical 
too. It is proved in a similar still more general framework (continuous time) in [7] using a 
Taylor expansion. The same method is and applied to show the uniqueness for AFRRNN 
model in [34]. 

Thus, we have the main result of that section: 

Theorem 8 When the size N of the network goes to infinity, the sequence of empirical mea- 
sures (yU„) converges in probability exponentially fast towards ht which is the unique solution 
of the mean-field equation L(fi) = fi 
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3.4 Main results of RRNN mean-field theory 

First notice that theorem 8 may be extended to RRNN with fast decreasing connection weights 
distribution. More precisely, assume that the common distribution of the connexion weights 
satisfies the following: 

Hypothesis 9 (H) For all N , the common probability law vjq of the connexion weights satisfies 

(i) J wdv N (w) = 

(ii) / w 2 dv N (w) = £ + 

(in) 3a > 0, 3D > such that J exp(aNw 2 )dvN(w) < D 

then the family (fjv) is said to satisfy hypothesis (H). 

Then, from theorem 8 two important results can be deduced rigorously. The first one is a 
"propagation of chaos" result which supports the basic intuition of mean field theory about 
the asymptotic independence of finite subsets of individuals when the population size grows to 
infinity. 

Theorem 10 Let k be a positive integer and (/i)ie{i....,fc} be k continuous bounded functions 
on T ', when the size N of the network goes to infinity, then 



k k 

Y[fi(ui)dQ N (u) -> JJ / fi(v)dW) 
»=i i=i 



(V) 



PROOF : The idea of the proof is due to Snitzman [41]. 

First, a straightforward consequence of theorem 8 is that when we apply the sequence of 
random measures (/xjv) to the test function F on V(T) defined by F(n) = Ili=i / fi( u i)dti(u) 
we get the convergence of 



lim 



k 



N 



N 



3 = 1 



i=i 



Thus it remains to compare jn»=i W fi( u j) dQN(u) and J Yii=i fi( u i)dQN(u) From 

the symmetry property of Qjy, it is clear that for any subset {ji, ...,jk} of k neurons among N, 
we have 

k „ k 



I f[fi{uji)dQN{u) = j \\_fi{ui)dQ N {u) 

i—1 i—1 



If we develop / ULi 7f [£j=i fi( u i) 



(IQn(u), we get 



N 



dQ N (u) = -^ ]T [f[Mu n )dQN(u) (27) 

\. ./.•: • ' 



The average sum in (27) is here over all applications of {1, ...,&} in {1,...,N}. And the 
equality is proved if we replace it by the average over all injections of {1, k} in {1, ...,N}, 
since the terms are all equal for injections. But when TV goes to infinity the proportion of 
injections which is ^ N ^ly Nk goes to 1 and thus the contributions of repeated k-uple is negligible 
when N is large. Therefore 



lim 

N^oo 



1 



N 
3=1 



dQ N (u) - I Y\_fi(ui)dQ N (u) 
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Still, this propagation of chaos result is valid when the expectation of the test function is 
taken with respect to the connection distribution. Thus, it doesn't say anything precise about 
the observation relative to a single large-sized network. 

Actually, since exponentially fast convergence in probability implies almost-sure convergence 
form Borel-Cantelli lemma, we are able to infer the following statement from theorem 8. Recall 
that we note (as in the proof of theorem 3) Qn{J) the conditional distribution of the network 
state trajectory given J the system of synaptic weights and we define /iat(u) — jj J2iLi f° r 
the empirical measure on T which is associated to a network trajectory u e T N . 

Theorem 11 Let F be a bounded continuous functional on V(T), we have almost surely in J 

lim / F[fi N (u)]dQ N (J)(u) = F(n T ) 

Note that we cannot use this theorem to infer a " quenched" propagation of chaos result similar 
to theorem 10, which was an "annealed" propagation of chaos result (i.e. averaged over the 
connection weight distribution). This is not possible because, for a given network configuration 
J Qn{J) is no more symmetric with respect to the individual neurons. Nevertheless, we 
obtain the following crucial result, applying theorem 11 to the case where F is the linear form 



Theorem 12 Let f be a bounded continuous function on T , we have almost surely in J 

\ N 
m — 

N- 



N 

i=l •* •* 



The consequences of these results arc developed in the next section. 



4 Mean-field dynamics for analog networks 

We are interested in the stationary dynamics of large random recurrent neural networks. More- 
over since we want to study the meaning of oscillations and of (deterministic) chaos observed 
in the finite sized models (see paper I) , the regime of low noise is specially interesting since the 
oscillations are practically canceled if the noise is too strong. For these reasons, we cannot be 
practically satisfied by obtaining the limit of the empirical measures. So we shall extract 
from [It dynamical informations on the asymptotics of the network trajectories. Notice that 
the distribution of the connexion weight distribution is not necessarily Gaussian as long as it 
satisfies hypothesis (H:9). 



4.1 Mean-field dynamics of homogeneous networks 

4.1.1 General mean-field equations for moments 

Recall that in section 2 of this chapter (definition 2) we defined for any probability measure 
fi G V(T) the two first moments of /x, and c M . Let us recall these notations: 

m^t+l) = JJf[r 1 (t)}df,(r 1 ) 
c„(s + 1, t + 1) = J 2 / f[ V {8)]f[r,(t)]dii(r,) 
q„(t + l) = c„(t+l,t+l) 

e x 

where / is the sigmoid function f(x) = 
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In this section, in order to alleviate notations, we note m, c, q instead of , c MT , q^ T where 
fiT is the asymptotic probability that was shown to be a fixed point of the mean-field evolution 
operator L in last section. By expressing that /it is a fixed point of L, we shall produce some 
evolution autonomous dynamics on the moments m, c, q. 

More precisely we have from the definition of L (see definition 3 in section 3) that the law 
of r](t) under fir is a Gaussian law of mean m(t) — 9 and of variance q(t) + a 2 (see equations 
(20) and (21)). So we have 



' m(t + 1) = Jff[y/ q(t) + <TH + ro(t) - 0]d 7 (£) 
. q(t +1) = J 2 J fWq(t)+a 2 £ + m(t) 0] 2 d 7 (£) 



(28) 



where 7 is the standard Gaussian probability on IR: d 7 (£) = ^= cxp — ^- d£. 

Moreover, the covariancc of (t](s),r](t)) under \it is c(s,t) if s ^ t. Thus in this case, 
considering the standard integration formula of a 2 dimensional Gaussian vector: 



E[f{X)g{Y)] = 

J J f ( V / yar(X)V TSy) C 2£mi a + 77=^ + E(X)}g[y/vZFV)b + £7(y)]d 7 Ki)d 7 (6) 
we obtain the following evolution equation for covariance: 

c(s + l,t + 1) = 

J 2 Iff (yJ M'^^^ &Zi + -$=^b + m(s) fiVWH^b + m(t) fl]d 7 «i)d 7 (6) 

(29) 

The dynamics of the mean-field system (28,29) can be studied as a function of the parameters: 

— the mean J of the connexion weights, 

— the standard deviation J 2 of the connexion weights 

— the firing threshold 9 of neurons. 

Notice that the time and size limits do not necessarily commute. Therefore, any result on long 
time dynamics of the mean-field system may not be an exact prediction of the large-size limit of 
stationary dynamics of random recurrent networks. However, for our model, extensive numerical 
simulations have shown ([12], [17] and chapter I) that the time asymptotics of the mean-field 
system is informative about moderately large random recurrent network stationary dynamics 
(from size of some hundred neurons). 

More precisely, in the low noise limit (a << 1), two points of view are interesting: 

— the ensemble stationary dynamics is given by the study of the time asymptotics of the 
dynamical system 

f m(t + 1) = J J /[v^)e + m(t) - 0]d 7 (O , . 

\q(t+l) = J 2 J.f[^ q jr^ + m(t)-9] 2 d 1 (0 ( ) 

— the synchronization of the individual neuron trajectories. Actually, m(t) and q(t) may con- 
verge, when t — > 00, towards limits m* and q* (stable equilibria of the dynamical system 
30) with a great variety of dynamical behaviors. Each individual trajectory may converge 
to a fixed point and (m* , q* ) are the statistical moments of the fixed point empirical dis- 
tributions. Another case is provided by individual chaotic oscillations around m* where q* 
measures the amplitude of the oscillations. 

The discrimination between these two situations which are very different from the point of view 
of neuron dynamics is given by the study of the mean quadratic distance which will be outlined 
in the next paragraph. 
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4.1.2 Study of the mean quadratic distance 

The concept of mean quadratic distance was introduced by Derrida and Pommeau in [19] to 
study the chaotic dynamics of extremely diluted large size networks. The method originates to 
check the sensitivity of the dynamical system to initial conditions. The idea is the following: 
let us consider two networks trajectories and of the same network configuration which 
is given by the synaptic weight matrix (Jy). Their mean quadratic distance is defined by 

= ^E[« i (1) (*)-« i (9) (*)] a 

i=l 

For a given configuration, if the network trajectory converges towards a stable equilibrium or 
towards a limit cycle (synchronous individual trajectories), then the mean quadratic distance 
between closely initialized trajectories goes to when times goes to infinity. On the contrary, 
when this distance goes far from 0, for instance converges towards a non zero limit, whatever 
close the initial conditions are, the network dynamics present in some sense "sensitivity to 
initial conditions" and thus this behavior of the mean quadratic distance can be considered 
to be symptomatic of chaos. We applied this idea in [11] to characterize instability of random 
recurrent neural network. 

In the context of large deviation based mean-field theory, the trajectories and are 
submitted to independent synaptic noises and the mean quadratic distance is defined by 

i N r 

rfi,2W = / -uf\t)?dQ^\ u V,uM) (31) 

i=i J 

where Q^' 2 ^ is the joint probability law on T 2N of the network trajectories (u^\ u^) over the 
time interval {0, ...,T}. Following the same lines as in last sections, it is easy to show a large 

deviation principle for the empirical measure of the sample {uf~\ Ui^)ie{i,...,N under 
when N — > co. Then we get the almost sure convergence theorem 

j&^E / ' fM)h{u 2 )dQ N {j){u)= f h{m)h{m)d^ a \num) 
i=i J J 

where fJ^' 2 ^ is the fixed point of the mean-field evolution operator L*- 1,2 -* of the joint trajectories, 
which is defined on the probability measure set V{T x T) exactly in the same way as L was 
defined previously in definition 3. 

Then if we define the instantaneous covariance between two trajectories by: 



Definition 5 The instantaneous cross covariance between the two trajectories under their joint 
probability law is defined by 

ci, 2 (t) = J Vi(t)m(t)d^' 2 \ m ,n 2 ) (32) 

where ^' 2 ^ is the fixed point measure of the joint evolution operator L^ 1,2 ) defined from an 
initial condition 

then we can follow the argument, which was already used for the covariance evolution equation 
(29). Thus we obtain the following evolution equation for the instantaneous cross covariance 
equation 



ci, 2 (* + l) = 

J 2 Iff + **| & + mi(t) _ ^ /[V5(t)+^ 2 + m 2 (t) e]d 7 (6)d 7 (6) 

(33) 
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The proof is detailed in [33] . 

It is obvious now to infer the evolution of the mean quadratic distance from the following 
square expansion. 

Proposition 13 The mean quadratic distance obeys the relation 

di,2(t) = qi(t) + q 2 (t) - 2ci, 2 (t) + [mi(t) - m 2 {t)f (34) 

4.1.3 Study of the special case of balanced inhibition 

In order to show how the previous equations arc used we shall display the special case of 
balanced inhibition and excitation. The study of the discrete time 1-dimensional dynamical 
system with different parameters was addressed in paper I. See also ([12] and [17]) for more 
details. 

We choose in the previous model the special case where J — 0. This choice simplifies 
considerably the evolution study since Vt, m(t) = and the recurrence over q(t) is autonomous. 
So we have just to study the attractors of a single real function. 

Moreover, the interpretation of J = is that there is a general balance in the network 
between inhibitory and excitatory connections. Of course, the model is still far from biological 
plausibility since the generic neuron is endowed both with excitatory and inhibitory functions. 
In next section, the model with several populations will be addressed. Nevertheless, the case 
J = is of special interest. In the limit of low noise, the mean- field dynamical system amounts 
to the recurrence equation: 

«(* + !) = J 2 j f[VW)H-0] 2 dl{d) (35) 
we can scale q{t) to J 2 and we obtain 

q(t + 1) = J f{J 2 VW)t - OfdjiO = hji, e [q(t)] (36) 
where the function hj2 g of R + into IR + is defined by 

hji,e(Q) = J flJ 2 VqT)Z-0} 2 dj(O 

This function is positive, increasing and tends to 0.5 when q tends to infinity. The recurrence 
(36) admits on IR + a single stable fixed point q* (J 2 , 9). This fixed point is increasing with J 2 and 
decreasing with 6. We represent in figure 1 the diagram of the variations of function q*(J 2 , 9). It 
is obtained from a numerical simulation with a computation of hj2 9 by Monte-Carlo method. 

Let us now consider the stability of the network dynamics by studying the covariance and 
the mean quadratic distance evolution equation. The covariance evolution equation (29) in the 
low noise limit and when t — > oo amounts to 



C(S + 1, t + 1) = J 2 / / / (^Jt^¥^ + Sffifr _ ^ f {V¥ ^ 

Scaling the covariance with J 2 we obtain the recurrence 

c(s + l,t+l) = Hj2^Jc(s,t)} 

with 



0)d 7 (£i)d7(&) (37) 



Hi 



oM = J J f [ j2 y q -^^ + ^- )f( j2 ^2-9)d 1 (^)d 1 ^) (38) 
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Fig. 1. Variations of the fixed point q*( J ,0) as a function of the network configuration parameters 



q'-c 1 





Fig. 2. Variations of q* — c* as a function of the network configuration parameters J and 8 



It is clear from comparing with equation (35) that q* is a fixed point of Hj2 S q . To study 
the stability of this fixed point, standard computation shows that 



dH 



J 2 ,9,q* 

dc 



J f (J 2 V¥^-e) 2 d 7 (0 



(39) 



Then, as it is stated in paper I, the condition 



dH 



j2,e,, 



dc 



-(?*)< 1 is a necessary and sufficient 



condition for the stability of q* . A detailed and rigorous proof for 6 = is provided in [33] . 
Then two cases occur. 



dH 



In the first case where — J ^ c e ' q * (<?*) < l,the stationary limit of c(s + r, t + r) when r — * oo 
does not depend on t — s and is c* — q* ■ The stationary limit of the mean-field Gaussian 
process is a random point. Its variance is increasing with J 2 and decreasing with 9. 



— In the second case where 



dH 



dc 



— (<?*) > 1 does not depend on t — s when t — s ^ and 



is equal to c* < q* . The stationary limit of the Gaussian process is the sum of a random 
point and of a white noise. From the dynamical system point of view, this corresponds to 
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a chaotic regime (with infinitely many degrees of freedom). The signature of chaos is given 
by the evolution of the mean quadratic distance. The instantaneous covariance converges 
also towards c*. Therefore the mean quadratic distance converges towards a non zero limit, 
which is independent of the initial condition distance. As shown in [11] the transition from 
fixed point to chaos is given by an explicit equation which is the same as the equation of 
the De Almeida- Thouless line [5] in spin-glasses models. The analogy between these two 
systems is further developed in [6,11]. 

The figures 1 and 2 shows the evolution of q* and q* — c* as a function of J 2 and 6. When 
J 2 is small, there is no bifurcation. When J 2 is larger, a transition to chaos occurs when 8 is 
decreasing. When J 2 is growing, the transition to chaos occurs for increasing 9 values. Figure 
31 of paper I shows the interest of variation of input (which is equivalent to threshold variation) 
allows to hold up the occurence of chaos. 

4.2 Mean-field dynamics of 2-population AFRRNN 

4.2.1 2-population AFRRNN model 

As it was announced previously, the assumption of a homogeneous connexion weight model 
is not plausible. Besides, in literature, RRNN models with several neuron populations have 
been studied as early as in 1977 with [2] and have been thoroughly investigated in the last 
decade (see for instance [27]). The heterogeneity of neuron population induces interesting and 
complex dynamical phenomena such as synchronization. Actually the mean-field theory that 
was developed in the previous sections may be extended without major difficulty to several 
neuron populations. To give a practical idea of what can be obtained such extensions we consider 
here two populations with respectively N\ = XN and 7V 2 = (1 — X)N neurons where A G]0, 1[ 
and where N — > oo. 

Four connexion random matrices have to be considered in this model J\\,Jy2,,Ji\-,Ji7. 
where J{j is the matrix of connexion weights from population j to population i. The random 
matrix Jij is a (Nj x iVj) random matrix with independent identically distributed entries. Their 
distribution is governed by statistical parameters (Jy, Jfj) and obeys hypothesis (9). They are 
independent altogether. 

However, the technical hypothesis (H) docs not allow us to give to connexion weights a 
rigorously constant sign, permitting to distinguish between inhibitory and excitatory neurons. 
Indeed, there is no probability distribution on positive (rcsp. negative) real numbers, having 

a mean and a variance respectively scaling as and Thus, the positivity of the support 
induces on the other side of the distribution a heavy tail which will not respect assumption 
(iii) in hypothesis (H). However, it is possible to consider probability distributions which are 
checking hypothesis (H) and which are loading the negative numbers (or alternatively) the 
positive ones) with arbitrary small probability. 

We consider here a 2-population model with a population of excitatory neurons and a 
population of inhibitory neurons (up to the above restriction). 

4.2.2 General mean-field equations for moments 

A large deviation principle may be obtained for the 2-population model for Gaussian connexion 
weights. So, the convergence in finite time to the mean-field dynamics is shown, in the present 
model with the same proof as in the previous 1-population model. See [33] for a rigorous proof 
and [16] for a more practical statement of results. The limit of the empirical measure is the 
law of a Gaussian vector which takes its values in T x T . Each factor stands to describe the 
repartition of a neural population. Note that the two components are independent. As for the 
1-population model we note mk(t),qk{t),Ck((s,t) the mean, variance and covariance at given 
times of the empirical measure of population k (k € {1, 2}). The mean-field evolution equation 
for these moments is described by the following system: 
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' m k (t + 1) = Eje{i,2} J kj J f Wgj{t)+a 2 Z + mj(t) - ^]d 7 (0 
Qk(t + 1) = £ je{ i, 2} J 2 kj J fWq j (t) +aH + rn j (t)-O j ] 2 d>y(Q 

< c k (s + l,t+l)= £ J / Wff -^("«) a + ^§g^6 + m k (s) - 9 k ) x 

I xfWVk(t)+<r 2 & + m k (t) - e fc ]d 7 (a)d7(6) 

(40) 



4.2.3 Results and discussion 

As far as numerical studies are concerned, we choose the following values for the statistical 
parameters 



' Jl,l = 


gd 


Ji,i = 


9 


Jl.2 = 


-2gd 


Jl,2 = 


V2g 


J_2.1 = 


gd 


J2.1 = 


9 


J22 — 





J22 = 






(41) 



In this study, according to some biological scheme, excitatory neurons are connected both to 
excitatory neurons and inhibitory neurons and inhibitory neurons are both connected to exci- 
tatory neurons. Moreover, the number of parameters is reduced to allow numerical exploration 
of the synchronization parameter. We keep two independent parameters: 

— g stands for the non linearity of the transfer function 

— d stands for the differentiation of the two populations (inhibitory vs. excitatory). 

Considering the firing thresholds as previously, there is no variation about individual thresh- 
olds. Excitatory neuron threshold B\ is chosen equal to and inhibitory neuron threshold 62 is 
chosen equal to 0.3 because the activation potential of inhibitory neurons is always positive. 

In the bifurcation map of 3 (extracted from [16]) several dynamical regimes are displayed and 
the corresponding numerical ranges of parameters d and g are displayed. Notice that theoretical 
previsions of the mean- field equations (40) and the large scale simulations of large-sized networks 
behavior are consistent. 

As in the homogeneous case there is a transition to chaos for weak d. When the differentiation 
parameter d is sufficiently large (about 2), the fixed point looses its stability through a Hopf 
bifurcation to give rise to synchronous oscillations when g is growing. There is then a succession 
of bifurcations leading to chaos (see paper I) . 

Moreover, a new phenomenon occurs. For large g, there is a significant transition regime be- 
tween stationary chaos and synchronized oscillations which is named " cyclo- stationary chaos" . 
In that regime, statistical parameters are exhibiting regular periodic oscillations, though individ- 
ual trajectories are diverging with a mean quadratic distance behaviour which is characteristic 
from chaos. 



5 MFT-based oscillation analysis in IF networks. 

In this section we would like to give an interesting application of mean-field approaches for 
spiking neurons. It was developed in [9]. This paper is part of a current of research which 
studies the occurrence of synchronized oscillations in recurrent spiking neural networks [4,3, 
8] , in order to give an account of spatio-temporal synchronization effects which are observed in 
many situations in neural systems [24,36,10,35]. 
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Fig. 3. Bifurcation map of the 2-population model 



5.1 IFRRNN continuous-time model 

The model of [9] has continuous time. There is no synaptic noise but the neurons are submitted 
to a random external output. So, equation (9) has to be replaced by 



u{t) <6 => TV,(t) = -u(t) + v net (t) + v ext {t) 

u(t - o) = e u(t + o) = 



(42) 



where 



— t is the characteristic time of the neuron, 

— v net is the synaptic input from the network, 

— v ext is the external input, 

i3 is the reset potential and < $ < 9. Note that u(t — 0) and u(t + 0) are respectively the 
left and right limits of u at firing time t. Thus, the refractory period is assumed to be zero. 

This model of continuous time neuron dynamics is introduced in paper I, section 2.2.4. 

Moreover, since the inputs are modeled by continuous-time stochastic processes, equation 
(42) is a stochastic differential equation of the type 



rdu(t) = -u(t)dt + dV t 



(43) 



with dV(t) = dV ext (t) + dV net (t) 

Now we shall explicit these stochastic processes, in order to obtain the Fokker-Planck equa- 
tion of the network dynamics, in a mean-field approximation. 



5.2 Modeling the external input 

The network is a recurrent inhibitory network and we study its reaction to random excitatory 
synaptic inputs. We suppose that in the network each neuron receives excitations from C ex t 
external neurons connected via constant excitatory synapses J ex t- The corresponding external 
current is a Poisson process with emission frequency v ext . 
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Let us examine the effect of a superimposition of a large number C of independent identically 
distributed low-rate v Poisson processes. Put 

c 

i=l 

where A/i(t) are i.i.d. Poisson processes with firing rate v. Then J(t) is a stochastic process 
with independent stationary increments such that E(T(t)) = fit = JCvt and Var(Z(i)) = a 2 t = 
J 2 Cvt. Thus n = JCv and a = J^JTh>. 

We are interested in studying such processes when they reach the firing threshold 6 which 
is far larger than the elementary increment J. In typical neural applications, J = 0.1 mv and 
9 = 20 mV. At this level, operating a classical time-space rescaling, I{t) appears like a Gaussian 
process with independent increments and same moments. We have 

dl(t) ~ \idt + adB t 

where (B t ) is the standard Brownian motion. If we apply the subsequent to the external synaptic 
input we get the following modeling in the limit of large size and low rate 

dV ext (t) = Hextdt + <r ext dB(t) 

With fj, ext = JextCextVext and (T ex t — J extVCTxtMext ■ 

5.3 Mean-field approximation of the internal input 

In the framework of continuous-time modeling, the synaptic input definition of v net for IF 
neuron i which was, according to equation (3), 

JV 

Vi(j,u)(t + 1) = JijXj(t), 

has to be replaced by 

N 

Vi(J,u){t) =rJ2JioJ2 S (t- T H U ) D ) ( 44 ) 

J=l k 

where 

— 5 is the Dirac distribution, 

— T^iu) are the successive firing times of neuron j during the network trajectory u, 

— D is the synaptic transmission delay. 

In the present study, the network is supposed to be sparsely connected. All the connexion 
weights are equal to — J as soon as they are non zero. Each neuron is connected to C neurons 
which arc randomly drawn among the network with C « N connections, where C is a fixed 
integer and N is the total number of neurons. Another model is considered further where the 
connection weights are independent random variables equal to — J with probability and to 
else. We shall focus here on the first model. 

In previous sections, mean-field approximation in the finite time set framework consisted in 
finding a fixed point for the mean-field propagation operator L. Namely: 

— approximating random vectors Vi by Gaussian vectors with a probability distribution g^, 
where /i is a probability law on the individual neuron potential trajectory space (finite- 
dimensional vector space) 
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— finding \i as the probability law of the neuron dynamical equation with this approximation 
for the synaptic input. 

The mean-field approximation in [9] follows the same logic. 

The first step of the mean field approximation consists for a given rate function v in defining 
the non stationary Gaussian process 

dV net (t) = Hnet(t)dt + <7 net (t)dB (t) (45) 

where 

— the drift /i ne t is given by 

flnet(t) = -CJls(t-D)T (46) 

— and where the diffusion coefficient <r net is given by 

Vnet(t) 2 = J 2 Cu{t-D)T (47) 

The second step consists in considering the following diffusion with "tunneling effect" 
f u{t) <6 => rdu(t) = -u{t)dt + dV net {t) + dV ext {t) 

\u(t-o) = e=> u(t + o) = ^ 48j 

The terminology "tunneling effect" , referring to quantum mechanics, is somewhat curious 
here. It has its roots in the following remark. Whenever the membrane potential reaches 9 it is 
reset to t9. If we interpret eq. (48) in the context of a random particle motion, the "particle" 
is "instantaneously transported" from the point 9 to i9. This analogy is not only formal. The 
"tunneling effect" induces a specific behavior for the probability current at the boundary u = 6. 
In the present model, this current is directly related to the firing rate (see next section). 

5.4 Fokker-Planck equation. 

5.4.1 Closed form equation 

Note p(u,t) the probability density of the solution u(t) of (48). Define 

H{t) = H net(t) + Hextjt) 
(T(t) = \/(Tnet(t) 2 +<J ext {t) 2 

Then p(u, t) is solution of the Fokker-Planck equation for diffusion process for u < 9 and u ^ d: 

f t (u,t) = £|)!0(„, t ) + ± [(u-^t))p(u,t)] (49) 
The tunneling effect from 9 to d is taken into account in the following boundary conditions 

'p(M) = o . , 

&(* + o,t) = &(*-o,t) + &(*-o,t) ^ 

This corresponds to a re-injection of the outgoing probability current j(9, t) at u = where 
j = |£. Thus + 0, t) = j(d — 0, t) +j(9, t). The outgoing current (re-injected current) is, by 
definition, the average firing rate defined by 

"(t) = §£(*-0,t) (51) 
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5.4.2 Stationary solution 

It is easy to find the stationary solution of the previous equation 

!«->- ° 

Suppose a given constant firing rate v , then set 

f = -CJvqt + fj, ext 



(T = \JCJ 2 v a T + a 2 



ext 



and plug it into the differential second order equation 

Cq d 2 p d 



with the following boundary conditions 



One obtains easily the following stationary distribution 



For u < d, p(u) = 2Mn e -w« fj| e y dy 
For u > 0, p{u) = ^e-yl jf e e "'dy 



where y u = ^ , y# = and ye = 

oo 



2 /-+ 00 



^0T Jo 



2/ 



(52) 



2du» ' rf J(-MoM«)]=0 (53) 



l&(0 + O) = &(tf-O) + &(0-O,t) l&4j 



Then the normalization condition p(u)du — 1 allows to infer 

(55) 



The relations (52,55) allows to compute numerically v . The equation (55) can be approx- 
imately solved in the situation where the fluctuations do are weak (i.e. yg >> 1 which means 
that the spiking events are rare). In this case : 

v aT w ^=e-y» (56) 

This asymptotic expression can be compared to the escape probability from the equation of 
motion of a particle in a parabolic potential well V, with minimum /io, submitted to a Brownian 
excitation 

rdV t = -{V- no)dt + <r dB t 
The time rate to reach V = 9 is thus given by the Arrhenius time 

i/ t <~ e~ Ve 



Numerical values of v§ which arc inferred from equations (55) and (56) are compared in [9] 
to the result of numerical simulations of the network and there is a good agreement between 
theoretical predictions and simulated firing rates. 
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Fig. 4. Sketch of the bifurcation diagram of the model (10,44) when varying the parameters fiext, <?ext 
controlling the Poisson process of external excitation. SS means Stationary State, while OS means 
Oscillatory State. The solid line represents the instability line for D = O.lr. (Drawn by hand from [9]) 



5.4.3 Stability analysis. 



The stability analysis for the stationary solution uses normal form techniques similar to those de- 
scribed in paper I, but in an infinite dimensional space. The Fokker-Planck equation is rescaled 
and expanded around the steady-state solution. This intricate computation is fully detailed in 
[9] . We simply focus to the results. 

The Authors find that there is a bifurcation of Hopf type for the stationary solution. Thus, 
for a certain parameter range, the system exhibits synchronized oscillations of the neurons. 
A sketch of the bifurcation map is given in figure 4 when varying the parameters fi ext , <r ex t 
controlling the external excitation. 

One can see from that bifurcation diagram that the bifurcation occurs when the drift of the 
external input is increasing. On the opposite, an increase of the dispersion of the external input 
stabilizes the steady state. If the external input consists in the superposition of i.i.d. Poisson 
processes as it was detailed above, then the increase of their common frequency v e xt induces 
the occurrence of an oscillatory regime. There is still a good agreement between the predictions 
of mean-field theory and the results of simulations. 



5.5 Conclusion 



Thus, the conclusion is that in this model of a neural network with a sparsely connected 
inhibitory integrate-and-fire neurons, submitted to a external excitatory Poisson process, and 
emitting spikes irregularly at a low rate, there is, in the thermodynamic limit, a sharp transition 
between a regime where the average global is constant, to a synchronized state where neurons are 
weakly synchronized. The activity becomes oscillatory when the inhibitory feedback is strong 
enough. Note that the period of the global oscillations depends on the synaptic transmission 
delay which cannot be neglected. 

Finally, let us mention that the Authors performed a finite size analysis of the model and 
found that global oscillations of finite coherence time generically exist above and below the 
critical inhibition threshold. 
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6 Appendix about probability theory 

This paper uses intensively some classical notations and concepts coming from probability 
theory The proofs are omitted but sometimes the results follow from advanced results of this 
theory. It is not possible to recall here the necessary prerequisites. There are excellent books 
about probability theory for physicists and engineers such as [30]. We just want here to recall 
some notations and some results from convergence theory. We have detailed the proof of the 
"finite-time Girsanov theorem" since it is a crucial result for the paper. 

6.1 Elementary Notations 

The classical and shortest point of view for considering random phenomena from the 19th 
century is to consider a random variable a; in a space £ via its probability law on that space. All 
the moments can be computed by integration over the probability law of the random variable. 
For instance, if /z is the probability law of the real random variable x, one has 

E(x) = J xdfi(x) 

E(.t 2 ) = J x 2 dn(x) 
and more generally for any bounded continuous function <p of x 

E[<f>(x)} = J cj>{x)dn(x) 

where E is the mathematical expectation operator. The expectation of any random variable is 
a vector in a topological vector space T . The mathematical expectation operator is linear. 
Moreover, for a random vector x e R d the expectation E(x) € IR d is defined by 

m&{l,...,n},{E(x)}i=E(xi)= / Xidfj,(x) 

JR d 

and the symmetric (d, d)— covariance matrix is given by 

Cov(x)ij = E(xiXj) — E(xi)E(xj) 

where fj, is the probability law of x. 

Actually, this point of view cannot be used when we are obliged to consider an infinite set of 
random variables or when we want to operate a variable change. Hence, we are obliged to adopt 
a more general point of view which was initiated by Kolmogorov in 1933. This approach relies 
basically upon the consideration of a very large state space Q which describes all the possible 
outcomes or states of the world. Then a rich family A of subsets of Q is defined such that all 
the random events of interest are belonging to A. Eventually a probability measure is defined 
on A which associates to any random event A g A its probability P(A). The triple (f2,A,P) 
is called a probability space. 

Later on, we shall have to work on infinite-dimensional space. So let us fix a general frame- 
work 

Definition 6 A Polish space J 7 is a metric complete (every Cauchy sequence converges) and 
separable (there is a countable dense subset) space. The a— algebra B of Borel subsets of A Polish 
space T is the smallest a— algebra that contains the open sets. Given a probability measure [i on 
the Borel subsets of T it is possible to integrate any bounded continuous function <j> on T and 
the integral is noted fj-<j)(£)d(j,(£). The integral may be extended to a wider class of functions. 
These functions are called integrable with respect to fi. 
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In that new framework let us dchne random variables in T. 

Definition 7 Let (fl,A,P) be a probability space and {F,B) a Polish space endowed with its 
Borel a— algebra. A random variable x G T is a state function from fl into T such that for any 
open set B in T ', the subset of fl defined by 



belongs to A so its probability P(x G B) is well defined. 

The probability law of a random variable x G T is the probability law on T which associates 
to any Borel subset B C T the probability P(x G B). 

The probability law of x is noted x.P or P x . This definition stands for also for general 
measure than probability laws such as volume measures. More generally we have 

Definition 8 Let (fl, A, P) be a measure space and x a mapping from fl to T such that 



This measure is called the image of the measure P by the mapping x 

This definition is completed by the following transfer theorem which shows that the mathe- 
matical expectation can be computed on the state space fl or on the value space T . 

Theorem 14 For any function <f> defined on T and integrable for the probability law P x we 
have 



The transfer theorem is very useful in theory and in practice. It allows to define the math- 
ematical expectation of a random variable without any ambiguity. 

Kolmogorov's framework allows to define independent random variables by the equivalent 
following properties 

Definition 9 For i G {!,... ,n) let Xi G Ti be random variables, they are said independent if 
the law P x of the random variable x — (xi,...,x n ) G T\ x ... x T n is the product of the P Xi 
which is expressed in the following equivalent properties 



6.2 Density and Gaussian random vectors 

Definition 10 Let (fl,A,m) a measure space and h an integrable positive function on fl such 
that j Q h(oj)dm(uj) = 1. Then we can define a probability measure Q on (fl,A) by 



Q is said absolutely continuous with respect to m, h is called the density of Q with respect to 
m and we can compute the integral for Q by using the formula 



(x G B) = {oj G fl such that x(lo) G B} 



MB G B, (x G B) = {lo G fl such that = x(ui) G B} G A 
Then we define a measure on {T,B) that is noted x.P or P x by 

B e B —> x.P(B) = P X (B) = P(x G B) 




P(x G B l x ... x B n ) = P Xl (B 1 )...P Xn (B n ) 
E[ct) l {x l )...(t) n {x n )\ = E[<f> 1 (x 1 )]...E[<j> n (x n )] 





We write f^M = h{w) or dQ((w) = h{oj)dm{Lo) 
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Of course, the density functions are commonly used in elementary probability. An important 
class of probability measures is the Gaussian probability family. 

Definition 11 Let a <G IR and a 2 € IR + . The Gaussian probability measure 7 = Af(a,a 2 ) is 
defined by its density with respect to the Lebesgue measure X on IR, which is 

(£-m) 2 " 



d'j 
dX 



(0 



V2~7 



: CXp 



2a 2 



Similarly, this definition can be extended to d-dimensional vector space and even to infinite- 
dimensional Hilbert space. Here, we just need the following 

Definition 12 Let 9 £ R d and K be a d x d symmetric positive matrix, then there exists one 
and one only probability measure on R d , which is called the Gaussian probability 7 = Af(0,K) 

such that if 7 is the probability law of the random vector x E R" then Vw € IR d , the law of the 
random variable u l x 12 is Miv^O^v^Ku). 

Proposition 15 Let x be a random vector with regular Gaussian probability 7 = Af(8, K) then 
we have 

( E(x) = f £d 7 (0 = 6 

\ Cov(x) = E{xx t ) - E{x)E{xf = K 

So a Gaussian law is completely determined by its expectation and its covariance matrix. 

Definition 13 With the previous notations, if K is invertible, 7 is said to be regular and the 
density of 7 with respect to the Lebesgue measure X is 



dX y ^ J y/(2n) n Det(K) 



exp 



{i- m yK-\i- m ) 



(57) 



A common property of the Gaussian family is its stability by linear transforms and translation. 
More precisely, we have 

Proposition 16 Let x a Gaussian random vector which takes its value in the vector space E 
and A a linear mapping of E into F. Then y = Ax is a Gaussian random vector in F and 

f E(y) = AE(x) 
\ Cov(y) = ACov(x)A t 



(58) 



Proposition 17 Let x a Gaussian random vector which takes its value in the vector space E 
and a e E. Then y = x + a is a Gaussian random vector in F and 



f E(y) - E(x) + a 
[ Cov(y) = Cov(x) 



(59) 



Corollary 18 Let x be a random vector with regular Gaussian probability 7 = N{6, K) and let 
a £ IR , then the law j a of x + a is the regular Gaussian law Af(8 + a, K) and its density with 
respect to 7 can be written as follows 



> 



exp 



(60) 



PROOF : The formula is checked using an easy and straightforward computation from the ex- 
pression of the Gaussian density ■ 



ft is interesting to note that it is possible to define Gaussian probability on an infinite- 
dimensional vector space though it is not possible to define Lebesgue measure. However, in 
that paper we just use finite-dimensional Gaussian probabilities. An interesting property of the 
Gaussian measure, which is crucial in this paper, is the following finite-dimensional version of 
the Girsanov theorem[23]. 



u* is the transpose of column vector u, so u l x is the scalar product of vectors u and x 
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Theorem 19 Let mo a probability measure on R d and let J\f(a, K) be a Gaussian regular 
probability on R d . Let T a positive integer and £t = (R ){°>-> T } the space of finite time 
trajectories in R d . Let w a Gaussian random vector in Et with law mo ® J\f(a, K) T . Let (f> 
and ip two measurable applications of R d into R d . Then we define the random vectors x and y 
in £ by 

' Xq = w 
x(t + 1) = <j>[x(t)] + w(t + 1) 

Vq = w 

y(t+l)=ip[y(t)]+w(t + l) 

Let P and Q be the respective probability laws on £ of x and y, then Q is absolutely contin- 
uous with respect to P and we have 

dP W P ^ I +W(v(t)} - WvitWK-'Mt + 1) - a - 0[„(t)]} J 

PROOF : The proof is a recursion on T. It is easy to check (61) for T = 1. To reduce the 
expression let us write down 

V r = (y(0),...,y(T)),^ = ( V (0),..., V (T)) 

and 



T-l 



*=0 



Suppose (61) is true up to T and let us compute the density of y up to T + 1. Let h be a 
bounded continuous test function defined on £t+i- We have by conditioning with respect to 

E[%(T+l),y T )] = J E{h(w(T + l)+^[ V (T)},^)}dQ(^) 

where the expectation is taken with respect to w(T + 1), which is independent from yfi. Let us 
explicit the Gaussian law TV" (a, K) and use the recursion hypothesis: 

E [h(y(T+l),yT)] = 

C K JJ h(w + 1>[V(T)], % T ) exp {-|(w - afK-^uj - a)} exp e T (rfi)du;dP(r$) 

where Ck is the classic normalization constant for the Gaussian law. Then let us perform the 
translation w — ui + ip[r](T)], it gives 

E [h(y(T+l), y ^)] = 

C K JJ h{w, rfi) exp -a- ^(T)])* K-^w -a~ ^fo(T)])} exp e T {r$)dwdP(rfi) 

To simplify notations let us write down (t = ^VliT)] ~ (j)[r](T)], we have 
E [%(T+1),^)] = 

C K JJ h(w, r,l) exp - a - 0fo(T)] + Ct) 4 ^ 1 ^ -a- 0fo(T)] + ( T )} exp T (^)dnjdP(rjl) 

Let us develop the quadratic form in the exponential 
-\{w -a- 0fo(T)] + QtYK-^w -a- 0fo(T)] + ( T ) 

= -a- (^{jy^K-^w -a- <f>[ V (T)]) - \C T K~\ T + C T K~\w -a- <%(T)]) 
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So we have 

cxp {-Um -a- (f>[r){T)\ + CrfK-^w -a- <f>[ v (T)] + Ct)} 

= cxp -a- 4>[r](T)]) t K~ 1 (zu -a- #?(T)])} exp {-\&K-\ T + &K~\w -a- <P[ V (T)))} 

We obtain a product of two exponentials. The first one combines itself with CxdwdP^Q) to 
give dP(rjQ +1 ): the second one combines itself with expOr^) to give exp Ot+i (t!q +1 )- S° we 
get eventually 

E[% T+1 )] = / M% T+1 )exp0 T+1 (^+ 1 )dP(^+ 1 ) 



6.3 Convergence of random variables 

The definition of probability is based upon the law of large numbers (LLN). This last result 
may be roughly formulated as follows: 

When (x n ) is an independent 13 sequence of random variables with the same probability law p 
with two first moments c — J xdp(x) and k = J x 2 dp(x) then the sequence of empirical averages 

x n = „' — converges towards c. 

This statement is not precise. The convergence may have several senses. Some useful conver- 
gence concepts in probability theory are the convergence in law, the convergence in probability 
and the almost sure convergence. Let us recall their definition. 

Definition 14 Let (x n ) and x be random variables on a probability =20 space (f2,A,P). The 
sequence of random variables (x n ) is said to 

— converge in law to x if and only if for any continuous bounded function h, E[h(xjy)] — > 
E[h(x)] 14 

— converge in probability to x if and only if 

Ve > 0,P{\ x n -x |> e) -» 

— converge almost surely to x if and only if 

3N C Q with P(N) = such that £ N,x n (u>) -> x(w) 

These definitions are stronger and stronger. Almost sure convergence implies convergence in 
probability which implies in turn convergence in law. Most mean-field computations of mean- 
field equations in random neural networks use the convergence of Fourier transforms through 
a Laplace limit integral ensuring convergence in law. 

However, from the point of view of practitioners, almost sure convergence is more pleasant 
because a single realization of the sequence (X n ) allows to check the convergence. To check the 
weaker convergence statements, a lot of realizations of the sequence are necessary. 

Let us return to the law of large numbers. The convergence in probability of the sequence 

, 2 

(x n ) is specially easy to show since F,(x~^) = c and Var(x„) = Then one has just to write 

the Bienayme-Tchebychev inequality 

P{\x n -c\>e)<^4- 

But this convergence is not strong enough to show the almost sure convergence (the so-called 
strong large number law). 



13 Such a sequence is called an i.i.d. sequence 

4 An equivalent condition is the convergence of their characteristic functions (or Fourier transforms): 



Vt € R,E(exp(ita;„)) -» E(exp(ite)) 



Will be inserted by the editor 



31 



6.4 Large deviation principle 

6.4.1 Cramer's theorem 

One way to obtain the strong law is to show that the convergence in probability occurs much 
faster than it appears from Bienayme-Tchebychev inequality. 

Actually the following theorem was obtained by Cramer in the late 30's: 

Theorem 20 Let (x n ) sequence of i.i.d. random variables with probability law /! such that 
J £a7*(0 = 9. Then we have 

Va > £, -logPfe„ > a) -> -1(a) 
n 

where 

1(a) — max pa — E[exp(px)] 
per 

This theorem can be extended to more general settings. It is the subject of large deviation 
theory. Let us first consider the case of finite-dimensional random vectors [29] 
The following proposition is easy to prove: 

Proposition 21 Let /i a probability law on R d such that for all p G R d , A(p) = log E[e'xjp(p t x)] 
exists. The function A is called the log-generating function of jj,. We define its Legendre trans- 
form A* on R d as follows: 

A*(a)= snp[(p t a)-A(p)} 
peR d 

then 

a) A* is a convex function (with oo as a possible value) 

b) Va G R d ,A*(a) > 

c) a = / £d(j,(£) A* (a) = 

PROOF : a) is straightforward, since the supremum of convex functions is convex 

b) comes from Jensen's inequality. 

c) comes from .4(0) = 1 

■ 

Then we can state the Cramer's theorem for i.i.d. sequence of finite-dimensional random 
vectors: 

Theorem 22 Cramer's theorem: 

Let (x n ) be a sequence of i.i.d. random vectors with a probability distribution fi according to 
the assumption and the notations of the previous proposition. Then for any Borel subset B of 
R , we have 

- inf A* (a) < -Jim™ log[P(s„ G B°)] < -Ihn„ \og[P(x n G ~B)] < — inf A*(a) (62) 
aeB° n n oe s 

where B° is the interior set of B ( the greatest open subset of B) and B is the closure of B ( the 
smallest closed extension of B). 

A consequence of Cramer's theorem is that for any closed subset F in IR d such that 
inf ag -g- 4* (a) > 0, P(X n G F) goes to exponentially fast when n — > oo and that the rate 
of convergence depends only on the value of A* at the point of F where 4* reaches its min- 
imum. This point is called the dominating point. For regular probability distributions where 
4* is strictly convex, defined and continuous around 6 = E(x), the exponential decay of finite 
deviations from the expectation (large deviations) and the strong law of large numbers are easy 
consequences. 
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6.4.2 Large deviation principle in an abstract setting 

The convergence with an exponential rate is a general situation, which is characterized in the 
following general definitions: 

Definition 15 Let £ be a Polish space and I be a lower semi-continuous function of £ into 
[0, oo]. I is called a rate function. If I possesses the property of compact level set, i.e. 

Ve > 0, {x e £ such that I(x) < e} is compact 

then I is called a good rate function. 

Definition 16 Given a rate function I on a Polish space T and a sequence of probability 
measures Q n on T ' , if for any Borel subset B of J 7 , 

— (Q n ) satisfies the large deviation minoration on open sets if 

VO, open set in T , - inf J(£) < -limine log[Qn(0)l (63) 
?eo n 

— (Q n ) satisfies the large deviation majoration on compact sets if 

VK, compact set in T ', — lim^oo log[Q n (K))] < — inf I(x) (64) 
n ieK 

— (Q n ) satisfies the large deviation majoration on closed sets if 

VC, closed set in T , — lim„^oo log[Q„(C))] < — inf I(x) (65) 
n 

— If(Q n ) checks the large deviation minoration for open sets and the large deviation majoration 
for compact sets we say that (Q n ) satisfies the large deviation principle (LDP) with rate 
function I. 

— If (Q n ) checks the large deviation minoration for open sets and the large deviation majoration 
for closed sets we say that (Q n ) satisfies the full large deviation principle with rate function 
I. 

~ (Qn) is said tight if for all e > 0, it exists a compact subset K of T such that Q n ( c K) < e- 
If (Qn) is tight and checks a LDP, it satisfies the full LDP for the same rate function. 

The same definitions stand for a sequence of random elements in T if the sequence of their 
probability laws checks the respective maj orations. 

A simpler way to state that (Q n ) satisfy the full large deviation principle with rate function 
/ is to write that 

- inf 1(0 < -!im™ ^g[Q n (B)} < -lim"™ Iog[Q„(S))] < - inf /(*) (66) 
£eB° n n 

Actually, the scope of Cramer's theorem may be widely extended and a full large deviation 
principle is checked for the empirical mean of any i.i.d. random sequence in a Polish space under 
mild assumptions on the existence of the log-generating function [18]. The rate function of this 
LDP is the Legendre transform of the log-generating function. 



6.4.3 Varadhan theorem and Laplace principle 



An equivalent functional formulation of the full large deviation principle is due to Varadhan 
and is called by Dupuis and Ellis the Laplace principle ([20]). 
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Definition 17 Let I be a good rate function on the Polish space T . The random sequence (x n ) 
in T is said to satisfy the Laplace principle with rate function I if for any continuous bounded 
function h on £ we have 

lim - log£{exp[-n/i(a; n )]} = - inf {h(£) + 7(0} 

This approach is called by the authors of ([20]) the weak convergence approach to the 
theory of large deviations. The equivalence of the two approaches (Laplace principle for good 
rate functions and full large deviation principle with good rate functions) are expressed in a 
theorem of Varadhan and its converse. Their proofs are in ([20]). Handling continuous bounded 
test functions may be more practical than dealing with open and closed sets. In particular, 
it is very easy to show the following transfer theorem for the LDP principle when the law is 
changed. 

Theorem 23 Let P n and Q n two sequences of probability measures on the Polish space T , let 
I be a good rate function on F and let r a continuous function on F such that 

(a) Q n « P n and fgk(f) = expnT(£), 

(b) (Pn) satisfies a full large deviation principle with rate function I, 

(c) L — r is a good rate function, 

then (Q n ) satisfies a full large deviation principle with rate function I — T. 

PROOF OF THEOREM: Using the weak large deviation approach and the strong hypothesis of 
the theorem, the proof is quite formal. Let h be any continuous bounded test function on T ', 
from hypothesis (c) and ■ 



6.5 Convergence of random measures 

Let us have a second look at the law of large numbers. Since this law claims the convergence on 
the sequence of empirical averages ^ J2k=i f( x k) over anv bounded continuous test function / 
we are lead to consider the empirical measure of a sample. 

Definition 18 Let £ = £„) e IR nd a sequence of n vectors of '—20 R d . We associate to £ 

the following probability measure ^ e V(R d ) 

1 ™ 

n fe=i 

H x i is called the empirical measure associated to £. 

This definition says that if A is a Borel subset of T then \i N (x)(A) is the fraction of neurons 
which state trajectory belong to A. More practically, if <j> is any test continuous function on £, 
it says that 

1 N 

I (j>(r])dfiN(u)(ri) = <t>{ui) 

J £ i=l 

With this definition, the convergence for each continuous bounded test function / of i Y^k=i f( x k) 
towards / is exactly the narrow convergence of the sequence [i Xn towards fj,. 

The set V(R d ) of probability measure on R d is a convex subset of the functional vector 
space M 1 (R d ) of bounded measures on IR d . We endow V(R d ) with the narrow topology for which 
Mn ~~ * M if an d only if for all continuous and bounded test function / e C b (R d ), J fdfi„ — ► J fd\x. 
"P(IR d ) is a Polish state for this topology. 
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So instead of considering the random variable x which takes its values in R d , we consider the 
random variable which takes its values in the Polish space V(R d ). If (xk) is an i.i.d. sequence 
in R d with probability law /i, then 5 Xi is an i.i.d. =20 sequence in V(R d ) and its empirical 
mean is just (J,( Xl ,... tXn ) the empirical measure of an i.i.d. sample of size n. That means that=20 
Cramer's theorem extension to Polish spaces may be applied. This theorem is known as Sanov 
theorem. 

Let us first recall the definition of the relative entropy with respect to a probability measure 
fj, on IR d . 



Definition 19 Let fi be a probability measure on R d . We define a convex function v £ V{R d ) 
J(i/,A») e R by: 



i{",v) = I log (67) 

I(y, jLt) = oo else 



This function is called the relative entropy with respect to jj, 



then we may state the Sanov theorem [21], [18] 



Theorem 24 The sequence of empirical measure \x n which are associated to size n i.i.d. sample 
of a probability law on R d satisfy a full LDP with the relative entropy with respect to /x as the 
rate function. 
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